Skip to content

feat(rfc-0001): gateway proto + agent schema + stub handler [phase 1]#170

Merged
ankitgoswami merged 1 commit intomainfrom
ankitg/rfc-0001-phase-1-gateway-scaffold
May 5, 2026
Merged

feat(rfc-0001): gateway proto + agent schema + stub handler [phase 1]#170
ankitgoswami merged 1 commit intomainfrom
ankitg/rfc-0001-phase-1-gateway-scaffold

Conversation

@ankitgoswami
Copy link
Copy Markdown
Contributor

@ankitgoswami ankitgoswami commented May 4, 2026

Summary

Lands Phase 1 of RFC 0001 — the wire-protocol scaffold and schema for the agent + cloud-server split. No behavior change in any deployment shape (combined / server / agent).

  • New proto/agentgateway/v1/agentgateway.proto defines AgentGatewayService with Register, BeginAuthHandshake, CompleteAuthHandshake, UploadTelemetry, UploadEvents, UploadHeartbeat, and a bidi ControlStream. Post-handshake RPCs derive agent identity from a session token in Authorization metadata, not body fields.
  • New server/internal/handlers/agentgateway embeds the generated UnimplementedAgentGatewayServiceHandler, so every RPC returns CodeUnimplemented. The service is registered on the mux and added to grpcreflect alongside the pairing and telemetry services. All seven RPCs are added to UnauthenticatedProcedures so the user-session interceptor doesn't reject them; Phase 2 will replace this with a dedicated agent-auth interceptor.
  • New migration 000039_create_agent_tables adds agent and agent_device. agent_device is the single source of truth for device→agent ownership, carrying (agent_id, device_id, org_id) with composite FKs against agent(id, org_id) and device(id, org_id) so the DB enforces tenant alignment. Combined mode is the natural absence of a row in agent_device; the four leaf tables that Phase 2 will route through stay at their current shapes.
  • --mode flag accepted in server/cmd/fleetd/config.go via kong's enum validator (server / agent / combined, default combined). Not yet load-bearing.
  • Log redaction: handshake request/response procedures added to RedactedRequestProcedures/RedactedResponseProcedures; ControlStream, UploadTelemetry, UploadEvents added to SensitiveBodyProcedures and the streaming logger now suppresses per-message bodies for sensitive procedures.

Building scaffold

What is intentionally not in this PR

  • No RPC bodies. All return Unimplemented until Phase 2.
  • No code branches on --mode. Combined-mode behavior is unchanged.
  • No new sqlc queries against agent or agent_device. Phase 2 will add them when the agent binary actually starts using them.

Test plan

  • go build ./... clean from server/
  • go test ./... -skip ./generated/... -count=1 -short passes against the local timescaledb
  • Migration applies cleanly via migrate up. Verified agent and agent_device exist and device carries uq_device_id_org_id; the four leaf tables show no new columns or constraints.
  • --mode flag appears in fleetd --help with default combined; invalid values rejected with --mode must be one of "server","agent","combined"
  • golangci-lint run and buf lint clean
  • TODO before ready-for-review: full just dev canary to confirm virtual miner telemetry still flows. Requires Docker plugin build; not exercised on this branch.

Closes #157
Refs #145

🤖 Generated with Claude Code

@ankitgoswami ankitgoswami added rfc-0001 RFC 0001: agent + cloud-server split phase-1 RFC 0001 Phase 1 labels May 4, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (e638366029e663e619758d1075688a540f0103dc...24ec5c0c09b7f4a7ac5f0e7b6cdc0984e7217a58, exact PR three-dot diff)
  • Model: gpt-5.4

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: MEDIUM

Findings

[MEDIUM] MODE creates a false security and deployment boundary

  • Category: Infrastructure
  • Location: server/cmd/fleetd/config.go:31
  • Description: The new MODE setting advertises server, agent, and combined, but there are no config.Mode call sites. Startup still unconditionally connects to Postgres, runs migrations, loads plugins, and mounts the full RPC surface. In practice, MODE=agent does not reduce the exposed surface at all.
  • Impact: An operator can reasonably deploy a host as “agent-only” and still expose the normal fleet/auth/onboarding endpoints. On a fresh node, that includes unauthenticated onboarding flow, so this is not just a docs bug; it weakens the intended trust boundary and can also cause agent-only deployments to fail because they still require the full server dependency set.
  • Recommendation: Enforce config.Mode in start() before DB/plugin initialization and before route registration. If mode-specific behavior is not implemented yet, reject server/agent at startup instead of silently behaving like combined.

[MEDIUM] The migration adds a blocking uniqueness constraint to the hot device table during automatic startup

  • Category: SQLi/Database
  • Location: server/migrations/000039_create_agent_tables.up.sql:1
  • Description: The migration starts with ALTER TABLE device ADD CONSTRAINT uq_device_id_org_id UNIQUE (id, org_id);. That forces a new unique index build on an existing production table before the new agent tables are created. In this repo, migrations run automatically during fleetd startup.
  • Impact: On a non-trivial fleet, rollout can stall or block writes while the new uniqueness constraint is built, turning a normal deploy into an availability event.
  • Recommendation: Pre-stage the required uniqueness with a concurrent/manual migration path, or redesign the FK strategy so a new composite unique constraint is not introduced on device in the automatic startup migration path.

Notes

The new AgentGatewayService is currently only scaffolding: it is mounted publicly, but the handler is still unimplemented, so registration/auth/telemetry/control flows will all return Unimplemented.

Review is based on the diff and surrounding code inspection only. I did not run tests or migrations in this read-only environment.


Generated by Codex Security Review |
Triggered by: @ankitgoswami |
Review workflow run

ankitgoswami added a commit that referenced this pull request May 4, 2026
Addresses security review feedback on PR #170 and the failing
generated-code-check CI job.

Schema (migration 000039):
- agent and device gain composite UNIQUE (id, org_id) so other tables
  can build (id, org_id) FKs against them.
- agent_device, device.agent_id, discovered_device.agent_id all use
  composite FKs against agent(id, org_id), so the DB rejects any binding
  that crosses tenants.
- queue_message gains a denormalized org_id BIGINT NULL with a CHECK that
  agent_id requires org_id, plus a composite FK to agent(id, org_id).
- command_batch_log reuses its existing organization_id (000036) for the
  composite FK and gains the same agent_id-requires-org CHECK.

Logging (server/internal/handlers/interceptors):
- agentgateway Register / BeginAuthHandshake / CompleteAuthHandshake /
  UploadHeartbeat are added to RedactedRequestProcedures.
- CompleteAuthHandshake response is added to RedactedResponseProcedures.
- ControlStream, UploadTelemetry, UploadEvents are added to
  SensitiveBodyProcedures so unary logs suppress the body entirely.
- request_logging.go: extend loggingStreamingHandlerConn to honor
  SensitiveBodyProcedures, suppressing per-message bodies on
  Receive/Send for sensitive streaming RPCs.

Authentication (server/internal/handlers/interceptors/config.go):
- All seven agent-gateway RPCs added to UnauthenticatedProcedures so
  the handler is reachable as advertised (returns CodeUnimplemented
  rather than CodeUnauthenticated). Comment explicitly notes Phase 2
  will replace this with a dedicated agent-auth interceptor.

Generated code:
- goimports applied to agentgateway pb.go and connect.go (the diff CI's
  generated-code-check was flagging).
- sqlc regenerated; AgentDevice model gains OrgID, QueueMessage gains
  OrgID (sql.NullInt64).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ankitgoswami added a commit that referenced this pull request May 4, 2026
…ment_token

Addresses additional security review feedback on PR #170.

[MEDIUM] Upload RPCs were trusting a caller-supplied agent_id rather than
the handshake-derived session token. The proto now drops agent_id from
UploadTelemetryRequest, UploadEventsRequest, UploadHeartbeatRequest, and
ControlHello. The new file-level comment block specifies the auth
contract: every post-handshake RPC carries the session_token in
Authorization metadata, and server-side handlers derive the effective
agent identity (and tenant) from the validated token. Spoofing
agent_id is no longer expressible on the wire.

[MEDIUM] Register had no tenant-binding input despite the org-scoped
agent schema. RegisterRequest now carries enrollment_token (operator-
issued, single-use, org-scoped) as field 1; existing fields shift down
by one. The field is the org binding for the agent before the handshake
that issues an api_key.

Note: these are field-number changes to a Phase 1 stub with no real
clients yet. The wire shape is intentionally locked in before Phase 2
implementation begins so consumer code never has to migrate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added javascript Pull requests that update javascript code client labels May 4, 2026
@ankitgoswami ankitgoswami marked this pull request as ready for review May 5, 2026 16:23
@ankitgoswami ankitgoswami requested a review from a team as a code owner May 5, 2026 16:23
Copilot AI review requested due to automatic review settings May 5, 2026 16:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Phase 1 scaffold for RFC-0001’s agent↔server split: introduces the AgentGateway protobuf surface, wires it into the server mux/reflect, adds initial DB schema for agents and device ownership, and tightens request/stream logging redaction for the new RPCs.

Changes:

  • Added AgentGatewayService proto definition and regenerated Go/TS client/server stubs.
  • Registered a stub (unimplemented) AgentGateway handler on the server, and added reflection + logging/redaction configuration for the new procedures.
  • Added migration creating agent / agent_device tables (and required composite uniqueness on device(id, org_id)).

Reviewed changes

Copilot reviewed 8 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
server/sdk/v1/python/proto_fleet_sdk/generated/pb/driver_pb2_grpc.py Regenerated Python grpc stub metadata version constant.
server/migrations/000039_create_agent_tables.up.sql Adds agent/agent_device tables + device composite uniqueness needed for tenant-aligned FKs.
server/migrations/000039_create_agent_tables.down.sql Rollback for the new agent tables and device constraint.
server/internal/handlers/interceptors/request_logging.go Suppresses per-message streaming bodies for procedures marked sensitive.
server/internal/handlers/interceptors/config.go Adds agentgateway procedures to redaction/sensitivity configuration and unauth allowlist.
server/internal/handlers/agentgateway/handler.go New stub handler embedding Connect’s unimplemented service.
server/generated/sqlc/models.go Regenerated sqlc models for new agent tables.
server/generated/grpc/agentgateway/v1/agentgatewayv1connect/agentgateway.connect.go Regenerated Connect handlers/clients for AgentGatewayService.
server/generated/grpc/agentgateway/v1/agentgateway.pb.go Regenerated Go protobuf types for agentgateway.
server/cmd/fleetd/main.go Registers AgentGateway handler on mux and adds it to reflection.
server/cmd/fleetd/config.go Adds --mode enum flag to configuration (server/agent/combined).
proto/agentgateway/v1/agentgateway.proto New agentgateway service and message schema (handshake + telemetry/events + control stream).
client/src/protoFleet/api/generated/agentgateway/v1/agentgateway_pb.ts Regenerated TS protobuf types for agentgateway.

Comment thread server/internal/handlers/interceptors/config.go
Comment thread server/migrations/000039_create_agent_tables.up.sql Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3869df410e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/internal/handlers/interceptors/config.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa0a96a514

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/migrations/000039_create_agent_tables.up.sql Outdated
Lands Phase 1 of RFC 0001 (the agent + cloud-server split): the wire
protocol scaffold, agent identity schema, and a stub handler that
returns CodeUnimplemented for every RPC. No behavior change in any
deployment shape (combined / server / agent).

Wire protocol (proto/agentgateway/v1/agentgateway.proto):
  Register, BeginAuthHandshake, CompleteAuthHandshake, UploadTelemetry,
  UploadEvents, UploadHeartbeat, ControlStream. Post-handshake RPCs
  derive agent identity from a session_token in Authorization metadata
  rather than any body field. RegisterRequest carries an
  operator-issued, org-scoped enrollment_token. buf.validate rules pin
  ed25519 key/signature lengths, bound api_key/session_token/name,
  cap opaque payloads at 1 MiB, require timestamps, and require a
  populated ControlStream oneof variant.

Handler (server/internal/handlers/agentgateway):
  Embeds UnimplementedAgentGatewayServiceHandler. Registered on the
  shared mux and added to grpcreflect. All seven RPCs are in
  UnauthenticatedProcedures because the user-session AuthInterceptor
  cannot validate the agent's session_token; the handler is responsible
  for credential validation when implemented.

Logging redaction (server/internal/handlers/interceptors):
  Handshake request/response procedures added to redaction lists;
  ControlStream, UploadTelemetry, UploadEvents added to
  SensitiveBodyProcedures. The streaming logger now suppresses
  per-message bodies for sensitive procedures.

Schema (server/migrations/000039_create_agent_tables):
  agent_device is the single source of truth for device-to-agent
  ownership; combined mode is the absence of a row. agent_device
  carries (agent_id, device_id, org_id) with composite FKs to both
  agent(id, org_id) and device(id, org_id), so cross-tenant pairings
  are rejected by the DB. agent identity_pubkey and (org_id, name)
  uniqueness are partial indexes scoped to deleted_at IS NULL, so a
  soft-deleted agent does not block re-enrollment. created_at and
  updated_at are NOT NULL.

--mode flag (server/cmd/fleetd/config.go):
  Accepted via kong's enum validator (server/agent/combined, default
  combined). Not yet load-bearing.

Closes #157
Refs #145

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ankitgoswami ankitgoswami force-pushed the ankitg/rfc-0001-phase-1-gateway-scaffold branch from 36d8a08 to 24ec5c0 Compare May 5, 2026 17:31
Comment thread proto/agentgateway/v1/agentgateway.proto
@ankitgoswami ankitgoswami merged commit e51201b into main May 5, 2026
66 checks passed
@ankitgoswami ankitgoswami deleted the ankitg/rfc-0001-phase-1-gateway-scaffold branch May 5, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client javascript Pull requests that update javascript code phase-1 RFC 0001 Phase 1 rfc-0001 RFC 0001: agent + cloud-server split server shared

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC 0001-P1: Gateway proto + agent schema + stub handler

3 participants