RFC 0001: Agent + cloud-server split#145
Merged
ankitgoswami merged 2 commits intomainfrom May 7, 2026
Merged
Conversation
🔐 Codex Security Review
Review SummaryOverall Risk: HIGH Findings[HIGH] Enrollment flow does not authenticate the miner-signing key
[HIGH] Per-org credential key gives any compromised agent org-wide secret access
[MEDIUM] Unplanned agent loss has no non-destructive recovery path for Proto miners
Notes
Generated by Codex Security Review | |
69e41b0 to
3b65048
Compare
…plit) Adds a docs/rfcs/ directory with a README and template for the RFC process, plus the inaugural RFC 0001 proposing splitting fleetd into a thin on-prem agent and a cloud-deployable server, with a phased rollout that preserves today's combined-mode docker-compose deployment throughout. Includes the per-org symmetric encryption design for pool/miner credentials and the per-agent ed25519 design for Proto miner JWT signing, with the trade-offs explicitly documented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3b65048 to
9849f6e
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Introduces an RFC process under docs/rfcs/ and adds the first design RFC (RFC 0001) proposing a future split between an on-prem agent and a cloud-deployable server mode for proto-fleet.
Changes:
- Add
docs/rfcs/README.mddescribing when to write RFCs, lifecycle states, numbering, and format. - Add
docs/rfcs/_template.mdto standardize RFC structure and metadata. - Add
docs/rfcs/0001-agent-server-split.mdas a draft architectural proposal for an agent + server split with auth/credential model and phased rollout.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| docs/rfcs/README.md | Defines the RFC process, lifecycle, numbering, and usage guidance. |
| docs/rfcs/0001-agent-server-split.md | Draft RFC describing the agent/server split, security model, and rollout phases. |
| docs/rfcs/_template.md | Provides a standard RFC template for consistent structure and metadata. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Ankit Goswami <ankit.goswami@gmail.com>
flesher
reviewed
May 4, 2026
Collaborator
mcharles-square
left a comment
There was a problem hiding this comment.
Generally looks good
This was referenced May 4, 2026
ankitgoswami
added a commit
that referenced
this pull request
May 5, 2026
Lands Phase 1 of RFC 0001 (the agent + cloud-server split): the wire protocol scaffold, agent identity schema, and a stub handler that returns CodeUnimplemented for every RPC. No behavior change in any deployment shape (combined / server / agent). Wire protocol (proto/agentgateway/v1/agentgateway.proto): Register, BeginAuthHandshake, CompleteAuthHandshake, UploadTelemetry, UploadEvents, UploadHeartbeat, ControlStream. Post-handshake RPCs derive agent identity from a session_token in Authorization metadata rather than any body field. RegisterRequest carries an operator-issued, org-scoped enrollment_token. buf.validate rules pin ed25519 key/signature lengths, bound api_key/session_token/name, cap opaque payloads at 1 MiB, require timestamps, and require a populated ControlStream oneof variant. Handler (server/internal/handlers/agentgateway): Embeds UnimplementedAgentGatewayServiceHandler. Registered on the shared mux and added to grpcreflect. All seven RPCs are in UnauthenticatedProcedures because the user-session AuthInterceptor cannot validate the agent's session_token; the handler is responsible for credential validation when implemented. Logging redaction (server/internal/handlers/interceptors): Handshake request/response procedures added to redaction lists; ControlStream, UploadTelemetry, UploadEvents added to SensitiveBodyProcedures. The streaming logger now suppresses per-message bodies for sensitive procedures. Schema (server/migrations/000039_create_agent_tables): agent_device is the single source of truth for device-to-agent ownership; combined mode is the absence of a row. agent_device carries (agent_id, device_id, org_id) with composite FKs to both agent(id, org_id) and device(id, org_id), so cross-tenant pairings are rejected by the DB. agent identity_pubkey and (org_id, name) uniqueness are partial indexes scoped to deleted_at IS NULL, so a soft-deleted agent does not block re-enrollment. created_at and updated_at are NOT NULL. --mode flag (server/cmd/fleetd/config.go): Accepted via kong's enum validator (server/agent/combined, default combined). Not yet load-bearing. Closes #157 Refs #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
ankitgoswami
added a commit
that referenced
this pull request
May 6, 2026
Server foundation for issue #158. Lands the operator and agent paths of the enrollment flow end-to-end against the database; the fleet-agent CLI binary and the operator UI are deferred to follow-up PRs. Schema (migration 000040): - api_key gains agent support: user_id nullable, new agent_id, new subject_kind ('user'|'agent'), CHECK that exactly one of user_id / agent_id is set. Existing rows back-fill to subject_kind='user'. - New pending_enrollment table holds operator-issued bootstrap codes. State machine: PENDING -> AWAITING_CONFIRMATION -> CONFIRMED, with EXPIRED/CANCELLED terminal failure states. Plaintext is shown to the operator once and never persisted; only the SHA-256 hash is stored. - New agent_auth_challenge table holds short-TTL handshake nonces; atomic DELETE ... RETURNING gives replay safety without a consumed_at. - New agent_session table holds short-lived bearer tokens issued by CompleteAuthHandshake; mirrors the user-side session table. Domain: - agentenrollment.Service implements code lifecycle and the Register + Confirm transitions. Confirm flips pending_enrollment to CONFIRMED, marks agent.enrollment_status='CONFIRMED', and issues the agent's api_key via apikey.Service.CreateAgent. - agentauth.Service implements the BeginHandshake / CompleteHandshake / ResolveSession state machine. BeginHandshake verifies the api_key, cross-checks the supplied identity_pubkey against the enrolled key, and mints a one-shot challenge. CompleteHandshake atomically consumes the challenge and verifies the ed25519 signature against the agent's identity key, minting a session_token on success. - apikey.Service / store gain a CreateAgent path; existing user-key flows stay unchanged. Validate keeps a single signature; callers branch on SubjectKind. Auth context + interceptor: - agentauth.Subject is the typed value placed on ctx by AgentAuthInterceptor (mirrors session.Info for users, via connectrpc.com/authn). - AgentAuthInterceptor only fires on AgentAuthenticatedProcedures (Upload* and ControlStream); the user-session AuthInterceptor short-circuits those so the two interceptors don't fight. - UnauthenticatedProcedures now contains only the bootstrap RPCs: Register / BeginAuthHandshake / CompleteAuthHandshake. Handlers: - AgentGatewayService.Register / BeginAuthHandshake / CompleteAuthHandshake stop returning Unimplemented and delegate to the new domain services. - New AgentAdminService (proto/agentadmin/v1) gives operators CreateEnrollmentCode, ListAgents, ConfirmAgent. Authorized via the existing user-session AuthInterceptor; org_id resolved from session info. Tests: - Integration tests against a real timescaledb cover the happy path (create code -> register -> confirm -> handshake -> resolve session) plus the security cases called out in the AC: replay of a consumed code, expired code, replayed challenge, and identity_pubkey mismatch. Out of scope (follow-up PRs): - cmd/fleet-agent/ Go binary with `enroll` subcommand. - Operator UI: Agents settings page + EnrollAgentModal. Refs #158 Refs #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ankitgoswami
added a commit
that referenced
this pull request
May 6, 2026
Server foundation for issue #158. Lands the operator and agent paths of the enrollment flow end-to-end against the database; the fleet-agent CLI binary and the operator UI are deferred to follow-up PRs. Schema (migration 000040): - api_key gains agent support: user_id nullable, new agent_id, new subject_kind ('user'|'agent'), CHECK that exactly one of user_id / agent_id is set. Existing rows back-fill to subject_kind='user'. - New pending_enrollment table holds operator-issued bootstrap codes. State machine: PENDING -> AWAITING_CONFIRMATION -> CONFIRMED, with EXPIRED/CANCELLED terminal failure states. Plaintext is shown to the operator once and never persisted; only the SHA-256 hash is stored. - New agent_auth_challenge table holds short-TTL handshake nonces; atomic DELETE ... RETURNING gives replay safety without a consumed_at. - New agent_session table holds short-lived bearer tokens issued by CompleteAuthHandshake; mirrors the user-side session table. Domain: - agentenrollment.Service implements code lifecycle and the Register + Confirm transitions. Confirm flips pending_enrollment to CONFIRMED, marks agent.enrollment_status='CONFIRMED', and issues the agent's api_key via apikey.Service.CreateAgent. - agentauth.Service implements the BeginHandshake / CompleteHandshake / ResolveSession state machine. BeginHandshake verifies the api_key, cross-checks the supplied identity_pubkey against the enrolled key, and mints a one-shot challenge. CompleteHandshake atomically consumes the challenge and verifies the ed25519 signature against the agent's identity key, minting a session_token on success. - apikey.Service / store gain a CreateAgent path; existing user-key flows stay unchanged. Validate keeps a single signature; callers branch on SubjectKind. Auth context + interceptor: - agentauth.Subject is the typed value placed on ctx by AgentAuthInterceptor (mirrors session.Info for users, via connectrpc.com/authn). - AgentAuthInterceptor only fires on AgentAuthenticatedProcedures (Upload* and ControlStream); the user-session AuthInterceptor short-circuits those so the two interceptors don't fight. - UnauthenticatedProcedures now contains only the bootstrap RPCs: Register / BeginAuthHandshake / CompleteAuthHandshake. Handlers: - AgentGatewayService.Register / BeginAuthHandshake / CompleteAuthHandshake stop returning Unimplemented and delegate to the new domain services. - New AgentAdminService (proto/agentadmin/v1) gives operators CreateEnrollmentCode, ListAgents, ConfirmAgent. Authorized via the existing user-session AuthInterceptor; org_id resolved from session info. Tests: - Integration tests against a real timescaledb cover the happy path (create code -> register -> confirm -> handshake -> resolve session) plus the security cases called out in the AC: replay of a consumed code, expired code, replayed challenge, and identity_pubkey mismatch. Out of scope (follow-up PRs): - cmd/fleet-agent/ Go binary with `enroll` subcommand. - Operator UI: Agents settings page + EnrollAgentModal. Refs #158 Refs #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ankitgoswami
added a commit
that referenced
this pull request
May 6, 2026
Server foundation for issue #158. Lands the operator and agent paths of the enrollment flow end-to-end against the database; the fleet-agent CLI binary and the operator UI are deferred to follow-up PRs. Schema (migration 000040): - api_key gains agent support: user_id nullable, new agent_id, new subject_kind ('user'|'agent'), CHECK that exactly one of user_id / agent_id is set. Existing rows back-fill to subject_kind='user'. - New pending_enrollment table holds operator-issued bootstrap codes. State machine: PENDING -> AWAITING_CONFIRMATION -> CONFIRMED, with EXPIRED/CANCELLED terminal failure states. Plaintext is shown to the operator once and never persisted; only the SHA-256 hash is stored. - New agent_auth_challenge table holds short-TTL handshake nonces; atomic DELETE ... RETURNING gives replay safety without a consumed_at. - New agent_session table holds short-lived bearer tokens issued by CompleteAuthHandshake; mirrors the user-side session table. Domain: - agentenrollment.Service implements code lifecycle and the Register + Confirm transitions. Confirm flips pending_enrollment to CONFIRMED, marks agent.enrollment_status='CONFIRMED', and issues the agent's api_key via apikey.Service.CreateAgent. - agentauth.Service implements the BeginHandshake / CompleteHandshake / ResolveSession state machine. BeginHandshake verifies the api_key, cross-checks the supplied identity_pubkey against the enrolled key, and mints a one-shot challenge. CompleteHandshake atomically consumes the challenge and verifies the ed25519 signature against the agent's identity key, minting a session_token on success. - apikey.Service / store gain a CreateAgent path; existing user-key flows stay unchanged. Validate keeps a single signature; callers branch on SubjectKind. Auth context + interceptor: - agentauth.Subject is the typed value placed on ctx by AgentAuthInterceptor (mirrors session.Info for users, via connectrpc.com/authn). - AgentAuthInterceptor only fires on AgentAuthenticatedProcedures (Upload* and ControlStream); the user-session AuthInterceptor short-circuits those so the two interceptors don't fight. - UnauthenticatedProcedures now contains only the bootstrap RPCs: Register / BeginAuthHandshake / CompleteAuthHandshake. Handlers: - AgentGatewayService.Register / BeginAuthHandshake / CompleteAuthHandshake stop returning Unimplemented and delegate to the new domain services. - New AgentAdminService (proto/agentadmin/v1) gives operators CreateEnrollmentCode, ListAgents, ConfirmAgent. Authorized via the existing user-session AuthInterceptor; org_id resolved from session info. Tests: - Integration tests against a real timescaledb cover the happy path (create code -> register -> confirm -> handshake -> resolve session) plus the security cases called out in the AC: replay of a consumed code, expired code, replayed challenge, and identity_pubkey mismatch. Out of scope (follow-up PRs): - cmd/fleet-agent/ Go binary with `enroll` subcommand. - Operator UI: Agents settings page + EnrollAgentModal. Refs #158 Refs #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ankitgoswami
added a commit
that referenced
this pull request
May 6, 2026
Server foundation for issue #158. Lands the operator and agent paths of the enrollment flow end-to-end against the database; the fleet-agent CLI binary and the operator UI are deferred to follow-up PRs. Schema (migration 000040): - api_key gains agent support: user_id nullable, new agent_id, new subject_kind ('user'|'agent'), CHECK that exactly one of user_id / agent_id is set. Existing rows back-fill to subject_kind='user'. - New pending_enrollment table holds operator-issued bootstrap codes. State machine: PENDING -> AWAITING_CONFIRMATION -> CONFIRMED, with EXPIRED/CANCELLED terminal failure states. Plaintext is shown to the operator once and never persisted; only the SHA-256 hash is stored. - New agent_auth_challenge table holds short-TTL handshake nonces; atomic DELETE ... RETURNING gives replay safety without a consumed_at. - New agent_session table holds short-lived bearer tokens issued by CompleteAuthHandshake; mirrors the user-side session table. Domain: - agentenrollment.Service implements code lifecycle and the Register + Confirm transitions. Confirm flips pending_enrollment to CONFIRMED, marks agent.enrollment_status='CONFIRMED', and issues the agent's api_key via apikey.Service.CreateAgent. - agentauth.Service implements the BeginHandshake / CompleteHandshake / ResolveSession state machine. BeginHandshake verifies the api_key, cross-checks the supplied identity_pubkey against the enrolled key, and mints a one-shot challenge. CompleteHandshake atomically consumes the challenge and verifies the ed25519 signature against the agent's identity key, minting a session_token on success. - apikey.Service / store gain a CreateAgent path; existing user-key flows stay unchanged. Validate keeps a single signature; callers branch on SubjectKind. Auth context + interceptor: - agentauth.Subject is the typed value placed on ctx by AgentAuthInterceptor (mirrors session.Info for users, via connectrpc.com/authn). - AgentAuthInterceptor only fires on AgentAuthenticatedProcedures (Upload* and ControlStream); the user-session AuthInterceptor short-circuits those so the two interceptors don't fight. - UnauthenticatedProcedures now contains only the bootstrap RPCs: Register / BeginAuthHandshake / CompleteAuthHandshake. Handlers: - AgentGatewayService.Register / BeginAuthHandshake / CompleteAuthHandshake stop returning Unimplemented and delegate to the new domain services. - New AgentAdminService (proto/agentadmin/v1) gives operators CreateEnrollmentCode, ListAgents, ConfirmAgent. Authorized via the existing user-session AuthInterceptor; org_id resolved from session info. Tests: - Integration tests against a real timescaledb cover the happy path (create code -> register -> confirm -> handshake -> resolve session) plus the security cases called out in the AC: replay of a consumed code, expired code, replayed challenge, and identity_pubkey mismatch. Out of scope (follow-up PRs): - cmd/fleet-agent/ Go binary with `enroll` subcommand. - Operator UI: Agents settings page + EnrollAgentModal. Refs #158 Refs #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ankitgoswami
added a commit
that referenced
this pull request
May 6, 2026
Server foundation for issue #158. Lands the operator and agent paths of the enrollment flow end-to-end against the database; the fleet-agent CLI binary and the operator UI are deferred to follow-up PRs. Schema (migration 000040): - api_key gains agent support: user_id nullable, new agent_id, new subject_kind ('user'|'agent'), CHECK that exactly one of user_id / agent_id is set. Existing rows back-fill to subject_kind='user'. - New pending_enrollment table holds operator-issued bootstrap codes. State machine: PENDING -> AWAITING_CONFIRMATION -> CONFIRMED, with EXPIRED/CANCELLED terminal failure states. Plaintext is shown to the operator once and never persisted; only the SHA-256 hash is stored. - New agent_auth_challenge table holds short-TTL handshake nonces; atomic DELETE ... RETURNING gives replay safety without a consumed_at. - New agent_session table holds short-lived bearer tokens issued by CompleteAuthHandshake; mirrors the user-side session table. Domain: - agentenrollment.Service implements code lifecycle and the Register + Confirm transitions. Confirm flips pending_enrollment to CONFIRMED, marks agent.enrollment_status='CONFIRMED', and issues the agent's api_key via apikey.Service.CreateAgent. - agentauth.Service implements the BeginHandshake / CompleteHandshake / ResolveSession state machine. BeginHandshake verifies the api_key, cross-checks the supplied identity_pubkey against the enrolled key, and mints a one-shot challenge. CompleteHandshake atomically consumes the challenge and verifies the ed25519 signature against the agent's identity key, minting a session_token on success. - apikey.Service / store gain a CreateAgent path; existing user-key flows stay unchanged. Validate keeps a single signature; callers branch on SubjectKind. Auth context + interceptor: - agentauth.Subject is the typed value placed on ctx by AgentAuthInterceptor (mirrors session.Info for users, via connectrpc.com/authn). - AgentAuthInterceptor only fires on AgentAuthenticatedProcedures (Upload* and ControlStream); the user-session AuthInterceptor short-circuits those so the two interceptors don't fight. - UnauthenticatedProcedures now contains only the bootstrap RPCs: Register / BeginAuthHandshake / CompleteAuthHandshake. Handlers: - AgentGatewayService.Register / BeginAuthHandshake / CompleteAuthHandshake stop returning Unimplemented and delegate to the new domain services. - New AgentAdminService (proto/agentadmin/v1) gives operators CreateEnrollmentCode, ListAgents, ConfirmAgent. Authorized via the existing user-session AuthInterceptor; org_id resolved from session info. Tests: - Integration tests against a real timescaledb cover the happy path (create code -> register -> confirm -> handshake -> resolve session) plus the security cases called out in the AC: replay of a consumed code, expired code, replayed challenge, and identity_pubkey mismatch. Out of scope (follow-up PRs): - cmd/fleet-agent/ Go binary with `enroll` subcommand. - Operator UI: Agents settings page + EnrollAgentModal. Refs #158 Refs #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ankitgoswami
added a commit
that referenced
this pull request
May 6, 2026
Server foundation for issue #158. Lands the operator and agent paths of the enrollment flow end-to-end against the database; the fleet-agent CLI binary and the operator UI are deferred to follow-up PRs. Schema (migration 000040): - api_key gains agent support: user_id nullable, new agent_id, new subject_kind ('user'|'agent'), CHECK that exactly one of user_id / agent_id is set. Existing rows back-fill to subject_kind='user'. - New pending_enrollment table holds operator-issued bootstrap codes. State machine: PENDING -> AWAITING_CONFIRMATION -> CONFIRMED, with EXPIRED/CANCELLED terminal failure states. Plaintext is shown to the operator once and never persisted; only the SHA-256 hash is stored. - New agent_auth_challenge table holds short-TTL handshake nonces; atomic DELETE ... RETURNING gives replay safety without a consumed_at. - New agent_session table holds short-lived bearer tokens issued by CompleteAuthHandshake; mirrors the user-side session table. Domain: - agentenrollment.Service implements code lifecycle and the Register + Confirm transitions. Confirm flips pending_enrollment to CONFIRMED, marks agent.enrollment_status='CONFIRMED', and issues the agent's api_key via apikey.Service.CreateAgent. - agentauth.Service implements the BeginHandshake / CompleteHandshake / ResolveSession state machine. BeginHandshake verifies the api_key, cross-checks the supplied identity_pubkey against the enrolled key, and mints a one-shot challenge. CompleteHandshake atomically consumes the challenge and verifies the ed25519 signature against the agent's identity key, minting a session_token on success. - apikey.Service / store gain a CreateAgent path; existing user-key flows stay unchanged. Validate keeps a single signature; callers branch on SubjectKind. Auth context + interceptor: - agentauth.Subject is the typed value placed on ctx by AgentAuthInterceptor (mirrors session.Info for users, via connectrpc.com/authn). - AgentAuthInterceptor only fires on AgentAuthenticatedProcedures (Upload* and ControlStream); the user-session AuthInterceptor short-circuits those so the two interceptors don't fight. - UnauthenticatedProcedures now contains only the bootstrap RPCs: Register / BeginAuthHandshake / CompleteAuthHandshake. Handlers: - AgentGatewayService.Register / BeginAuthHandshake / CompleteAuthHandshake stop returning Unimplemented and delegate to the new domain services. - New AgentAdminService (proto/agentadmin/v1) gives operators CreateEnrollmentCode, ListAgents, ConfirmAgent. Authorized via the existing user-session AuthInterceptor; org_id resolved from session info. Tests: - Integration tests against a real timescaledb cover the happy path (create code -> register -> confirm -> handshake -> resolve session) plus the security cases called out in the AC: replay of a consumed code, expired code, replayed challenge, and identity_pubkey mismatch. Out of scope (follow-up PRs): - cmd/fleet-agent/ Go binary with `enroll` subcommand. - Operator UI: Agents settings page + EnrollAgentModal. Refs #158 Refs #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ankitgoswami
added a commit
that referenced
this pull request
May 6, 2026
Server foundation for issue #158. Lands the operator and agent paths of the enrollment flow end-to-end against the database; the fleet-agent CLI binary and the operator UI are deferred to follow-up PRs. Schema (migration 000040): - api_key gains agent support: user_id nullable, new agent_id, new subject_kind ('user'|'agent'), CHECK that exactly one of user_id / agent_id is set. Existing rows back-fill to subject_kind='user'. - New pending_enrollment table holds operator-issued bootstrap codes. State machine: PENDING -> AWAITING_CONFIRMATION -> CONFIRMED, with EXPIRED/CANCELLED terminal failure states. Plaintext is shown to the operator once and never persisted; only the SHA-256 hash is stored. - New agent_auth_challenge table holds short-TTL handshake nonces; atomic DELETE ... RETURNING gives replay safety without a consumed_at. - New agent_session table holds short-lived bearer tokens issued by CompleteAuthHandshake; mirrors the user-side session table. Domain: - agentenrollment.Service implements code lifecycle and the Register + Confirm transitions. Confirm flips pending_enrollment to CONFIRMED, marks agent.enrollment_status='CONFIRMED', and issues the agent's api_key via apikey.Service.CreateAgent. - agentauth.Service implements the BeginHandshake / CompleteHandshake / ResolveSession state machine. BeginHandshake verifies the api_key, cross-checks the supplied identity_pubkey against the enrolled key, and mints a one-shot challenge. CompleteHandshake atomically consumes the challenge and verifies the ed25519 signature against the agent's identity key, minting a session_token on success. - apikey.Service / store gain a CreateAgent path; existing user-key flows stay unchanged. Validate keeps a single signature; callers branch on SubjectKind. Auth context + interceptor: - agentauth.Subject is the typed value placed on ctx by AgentAuthInterceptor (mirrors session.Info for users, via connectrpc.com/authn). - AgentAuthInterceptor only fires on AgentAuthenticatedProcedures (Upload* and ControlStream); the user-session AuthInterceptor short-circuits those so the two interceptors don't fight. - UnauthenticatedProcedures now contains only the bootstrap RPCs: Register / BeginAuthHandshake / CompleteAuthHandshake. Handlers: - AgentGatewayService.Register / BeginAuthHandshake / CompleteAuthHandshake stop returning Unimplemented and delegate to the new domain services. - New AgentAdminService (proto/agentadmin/v1) gives operators CreateEnrollmentCode, ListAgents, ConfirmAgent. Authorized via the existing user-session AuthInterceptor; org_id resolved from session info. Tests: - Integration tests against a real timescaledb cover the happy path (create code -> register -> confirm -> handshake -> resolve session) plus the security cases called out in the AC: replay of a consumed code, expired code, replayed challenge, and identity_pubkey mismatch. Out of scope (follow-up PRs): - cmd/fleet-agent/ Go binary with `enroll` subcommand. - Operator UI: Agents settings page + EnrollAgentModal. Refs #158 Refs #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
ankitgoswami
added a commit
that referenced
this pull request
May 6, 2026
Server foundation for issue #158. Lands the operator and agent paths of the enrollment flow end-to-end against the database; the fleet-agent CLI binary and the operator UI are deferred to follow-up PRs. Schema (migration 000040): - api_key gains agent support: user_id nullable, new agent_id, new subject_kind ('user'|'agent'), CHECK that exactly one of user_id / agent_id is set. Existing rows back-fill to subject_kind='user'. - New pending_enrollment table holds operator-issued bootstrap codes. State machine: PENDING -> AWAITING_CONFIRMATION -> CONFIRMED, with EXPIRED/CANCELLED terminal failure states. Plaintext is shown to the operator once and never persisted; only the SHA-256 hash is stored. - New agent_auth_challenge table holds short-TTL handshake nonces; atomic DELETE ... RETURNING gives replay safety without a consumed_at. - New agent_session table holds short-lived bearer tokens issued by CompleteAuthHandshake; mirrors the user-side session table. Domain: - agentenrollment.Service implements code lifecycle and the Register + Confirm transitions. Confirm flips pending_enrollment to CONFIRMED, marks agent.enrollment_status='CONFIRMED', and issues the agent's api_key via apikey.Service.CreateAgent. - agentauth.Service implements the BeginHandshake / CompleteHandshake / ResolveSession state machine. BeginHandshake verifies the api_key, cross-checks the supplied identity_pubkey against the enrolled key, and mints a one-shot challenge. CompleteHandshake atomically consumes the challenge and verifies the ed25519 signature against the agent's identity key, minting a session_token on success. - apikey.Service / store gain a CreateAgent path; existing user-key flows stay unchanged. Validate keeps a single signature; callers branch on SubjectKind. Auth context + interceptor: - agentauth.Subject is the typed value placed on ctx by AgentAuthInterceptor (mirrors session.Info for users, via connectrpc.com/authn). - AgentAuthInterceptor only fires on AgentAuthenticatedProcedures (Upload* and ControlStream); the user-session AuthInterceptor short-circuits those so the two interceptors don't fight. - UnauthenticatedProcedures now contains only the bootstrap RPCs: Register / BeginAuthHandshake / CompleteAuthHandshake. Handlers: - AgentGatewayService.Register / BeginAuthHandshake / CompleteAuthHandshake stop returning Unimplemented and delegate to the new domain services. - New AgentAdminService (proto/agentadmin/v1) gives operators CreateEnrollmentCode, ListAgents, ConfirmAgent. Authorized via the existing user-session AuthInterceptor; org_id resolved from session info. Tests: - Integration tests against a real timescaledb cover the happy path (create code -> register -> confirm -> handshake -> resolve session) plus the security cases called out in the AC: replay of a consumed code, expired code, replayed challenge, and identity_pubkey mismatch. Out of scope (follow-up PRs): - cmd/fleet-agent/ Go binary with `enroll` subcommand. - Operator UI: Agents settings page + EnrollAgentModal. Refs #158 Refs #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ankitgoswami
added a commit
that referenced
this pull request
May 6, 2026
Add the agent-side CLI that drives the bootstrap flow against the already-merged enrollment + handshake server (PR #181). Supports three subcommands: enroll, status, refresh. enroll generates ed25519 identity + miner-signing keypairs, calls Register against AgentGatewayService, prints the local fingerprint for operator visual comparison, accepts the api_key the operator pastes back after confirming in the UI, runs the BeginAuth/CompleteAuth handshake, and persists everything to a 0600 YAML state file under $XDG_STATE_HOME/fleet-agent (or ~/.local/state/fleet-agent by default). refresh re-runs the handshake against the stored api_key. status reads back the local state. Tests are wire-level: they spin up an httptest server with the real connect handler against a fake AgentGateway that verifies the signature with ed25519.Verify, mirroring the server. Refs: #158, #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
flesher
approved these changes
May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docs/rfcs/) with a README and template.fleetdinto a thin on-premfleet-agentand a cloud-deployablefleetd --mode=server, with a phased rollout that keeps today's docker-compose deploy (--mode=combined) working throughout.AgentGatewayService), per-org credential model, schema impact, drawbacks, alternatives considered, and 6 phases with mermaid diagrams.This is a design RFC, not implementation. Land it as
draftso review can happen on the markdown; subsequent PRs implement phases 1-6.Test plan
../../server/...) resolve to existing source files at the cited line numbers#credentials,#phase-4-...) navigate correctlydraft->accepted🤖 Generated with Claude Code